Hierarchical Spectral Partitioning of Bipartite Graphs to Cluster Dialects and Identify Distinguishing Features

نویسندگان

  • Martijn Wieling
  • John Nerbonne
چکیده

In this study we apply hierarchical spectral partitioning of bipartite graphs to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectology. Besides showing that the results of the hierarchical clustering improve over the flat spectral clustering method used in an earlier study (Wieling and Nerbonne, 2009), the values of the second singular vector used to generate the two-way clustering can be used to identify the most important sound correspondences for each cluster. This is an important advantage of the hierarchical method as it obviates the need for external methods to determine the most important sound correspondences for a geographical cluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analyzing phonetic variation in the traditional English dialects: Simultaneously clustering dialects and phonetic features

This study explores the linguistic application of bipartite spectral graph partitioning, a graphtheoretic technique that simultaneously identifies clusters of similar localities as well as clusters of features characteristic of those localities. We compare the results using this approach to previously published results on the same dataset using cluster and principal component analysis (Shacklet...

متن کامل

Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features

In this study we apply a hierarchical bipartite spectral graph partitioning method to a Dutch dialect dataset to cluster dialect varieties and determine the concomitant sound correspondences. An important advantage of this clustering method over other dialectometric methods is that the linguistic basis is simultaneously determined, bridging the gap between traditional and quantitative dialectol...

متن کامل

The distinguishing chromatic number of bipartite graphs of girth at least six

The distinguishing number $D(G)$ of a graph $G$ is the least integer $d$ such that $G$ has a vertex labeling   with $d$ labels  that is preserved only by a trivial automorphism. The distinguishing chromatic number $chi_{D}(G)$ of $G$ is defined similarly, where, in addition, $f$ is assumed to be a proper labeling. We prove that if $G$ is a bipartite graph of girth at least six with the maximum ...

متن کامل

Bipartite Graph Partitioning and Content-based Image Clustering

This paper presents a method to model the images and their content descriptors in large image databases using bipartite graphs. A graph partitioning algorithm is then developed to cluster the images and their content description features simultaneously such that each cluster is automatically associated with the set of features that best describes its visual contents. The association of features...

متن کامل

Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features

In this study we use bipartite spectral graph partitioning to simultaneously cluster varieties and identify their most distinctive linguistic features in Dutch dialect data. While clustering geographical varieties with respect to their features, e.g. pronunciation, is not new, the simultaneous identification of the features which give rise to the geographical clustering presents novel opportuni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010